BCI NET

home *** CD-ROM | disk | FTP | other *** search

/ BCI NET / BCI NET Dec 94.iso / archives / utilities / text / tec10.lha / TEC.doc next >

Wrap

Text File | 1994-05-24 | 9.7 KB | 255 lines

The Text Converter - TEC 1.0 (c) 1994 Martin Mares, MJSoft System Software ================================================================================ Copyright: ---------- TEC and its documentation are Copyright (c) Martin Mares, MJSoft System Software, Prague, Czech Republic. This archive can be freely redistributed, as long as all of its files are included in their original form without any additions, deletions or modifications, and no more than a nominal fee is charged for its distribution. All copyright notices in the programs and accompanying documentation files must remain on their places. Also '.displayme' and other similar files may not be added. This is generally known as FREEWARE. Special permission is given to Fred Fish to distribute this program on his "Fish Disks". This software is provided "AS IS" without warranty of any kind, either expressed or implied. The author is not responsible for any damage caused by it. Introduction ------------ Almost any programmer sometimes needs to convert some text file to another one (for example AmigaGuide to plain text or stripping of ANSI-sequences...) and it usually results in writing of a small utility to do such a job. These utilities are very similar to each other and most of them contain the same routines for input and output buffering, because the buffered I/O provided by the dos is terribly slow. This is the reason why I decided to write TEC. TEC is a simple tool designed to simplify many text conversion tasks. It acts as a one-input and one-output state machine with one internal string register, therefore it would be better to use some other programs (awk) for field-oriented conversion. TEC requires OS 2.04 or higher and the ss.library. TEC is pure and can be made resident. Invocation: ----------- TEC may be started only from the CLI and has the following parameters: FILTER/M/A - a list of filter programs to be applied. If any of them is enclosed in single quotes, it will be interpreted as an one-line program. If it's a file name, default extension '.tec' will be appended. The input file will be processed by the first filter and passed as an input of the second filter ... the output of the last filter will be written to the output file. In many cases, single filter is enough to do the job. FROM/K, TO/K - names of source and destination file. If they're omitted, standard input/output is used instead. BUF/K/N - buffer size in bytes. Default=16384 bytes. Minimally 16 bytes. The conversion may be stopped in any time by pressing CTRL-C or by sending the break-C signal to it. The language: ------------- There are listed the basic elements of the language: - comments - everything from '%' to end of line is ignored (only when the percent sign is not a part of character or string constant) - separators - semicolon and the end of line character - characters - (a) specified by decimal code (b) specified by hexadecimal code (preceeded by $) (c) character constant (enclosed in single quotes). It may contain escape characters (see below). - keywords - COPY,MSG,PUT,EOF,STOP,FAIL,NOCASE,ELSE,CLR,ADD,PUTS,CAT,SWITCH, CSWITCH,USE,GLOBAL,CASE,BACK. - state names - sequences of letters, digits and underscores, which don't start with a digit. - strings - sequences of any characters (including escapes) enclosed in double quotes. They cannot exceed one line unless the end of line is immediately preceeded by backslash (ignored escape sequence). - escape sequences - beginning with backslash \t - tab \n - newline \\ - backslash \' - ' \r - return \e - escape (char #27) \" - " in addition to these rules, backslash followed immediately by newline is ignored allowing long commands to be split to more lines of source text. The language itself is not case-sensitive, but the rules written in it usually are. Program: -------- Each program consists of so called states. The interpreter can be in exactly one state at the moment. The conversion is started in the first state regardless to its name. It isn't necessary to specify a name of the first state unless there is a GLOBAL before it (see somewhere else what the GLOBAL is). 'Simple' state definition: -------------------------- <state name>: <commands> When this type of state is entered, the <commands> are executed and the program is stopped unless the command sequence ends with name of another state to continue the program by. The commands may be separated by semicolons or newlines, but it doesn't affect their execution in any way. 'Complex' state definition: --------------------------- <state name>: [<init commands>] [USE <state>] {<charlist> <commands> <sep>} [ELSE <commands>] In this case, the <init commands> are executed, then one character from input is read and the interpreter finds corresponding <commands> for such a character. If there exist no <charlist> containing recent input character, the <commands> after ELSE are executed. Init commands may be separated by <sep>, but it doesn't affect their execution in any way. <charlist> - a list of characters separated by white spaces. May contain the EOF keyword which equals to END OF FILE condition. <commands> - any command list (see below). If it is not specified, the input character is thrown away. If no next state is specified, the current state is used again. These rules have exception: The default action for EOF is the STOP command causing immediate stopping of execution. <sep> - separator - semicolon or end of line The ELSE part may be omitted - automatically replaced by ELSE COPY (the current character is copied to output stream without any changes). The USE keyword causes the current character conditions to be derived from given state, which may contain ONLY the character conditions (it means NO initial commands). Warning: the ELSE <commands> phrase has _no_effect_, because all character conditions are set from the state we derive from. Another warning: If there are some conditions with no next state (using the same state as the next one), they will contain the original state as destination (they got fixed before the USE command). Therefore alpha: '1' '2' put '@' gamma: use alpha '4' put '!' else put '>' is equivalent to: alpha: '1' '2' put '@' gamma: '1' '2' put '@' alpha ; '4' put '!' gamma ; else copy alpha As you can see above, the USE command has very limited use and will be probably improved in future releases. Command lists: -------------- {<command>} [<state name>] It means that the commands have to be executed in their natural order and then the converter has to continue with <state name>, if there's any. Basic commands: --------------- COPY - copy recently read character into output stream PUT <character> - copy given character into output stream > <character> - synonym to PUT STOP - stop conversion FAIL - stop conversion and exit with RC=10 CLR - clear contents of the string buffer ADD - add recently read character at the end of the string buffer (maximally 255 characters) PUTS - put contents of string buffer into output stream BACK - push recently read character back to the input stream. You may do it only ONCE unless the character is read again. Other commands: --------------- PUT <string> - copy string into output stream CAT <string> - add string at the end of the string buffer MSG <string> - copy string into standard output Switching: ---------- You may test contents of the string buffer by the SWITCH and CSWITCH. These tests may appear only in <init commands> of a state. The only difference between SWITCH and CSWITCH is that CSWITCH is case-sensitive. SWITCH {<string> <state>} ELSE <the command list continues here> The interpreter compares current contents of string buffer with the strings in the SWITCH command (in first-to-last order). Then it goes to <state> defined for first string which is equal to the buffer. If no string matches the buffer, the execution continues after the ELSE keyword. For example: CSWITCH "aaa" aaa ; "bbb" bbb ; "aaa" ccc ELSE STOP never calls the state ccc. Global definitions: ------------------- It's possible to write some conditions affecting read characters in ALL states (but these conditions may be overriden in some states by simple redefinition of them). These global conditions are called GLOBALs and are defined before the first state of the program. GLOBAL {<charlist> [<basic command>] [<state>]} [ELSE <basic command> <state>] There are two differences between standard state definitions and the GLOBALs: (1) The GLOBALs can contain only basic commands (not the more complex ones). (2) The references to current state are not fixed immediately, so you can say that you want to convert each 'A' into 'a' without changing current state. This is the reason why GLOBALs are usually more preferable than USEs. Case senstivity: ---------------- All character comparisons are case-sensitive. If you say NOCASE before <charlist>, each letter in each <charlist> in current state (unless you say CASE) will be automatically converted to both cases ('a' becomes 'a' 'A'). Final words: ------------ Send bug reports, comments and nice conversion tables to mjsoft@k332.feld.cvut.cz. This language is very simple and probably doesn't allow all things you need. The command set will be extended in some future version (if I will have some free time to do it), numeric registers and more string registers will be added, USE mechanism will be extended to be slightly more user-friendly and ... (mail me what would you like to be added). Thanks to Short Software for some good ideas.